#https://datatables.net/reference/option/
options(DT.options = list(scrollX = TRUE, pagin=TRUE, fixedHeader = TRUE, searchHighlight = TRUE))

BASICS

Chapter 4: The Ames housing data

data('ames');a = ames

a = a %>%
  clean_names() %>% 
  select(sort(tidyselect::peek_vars())) %>% 
  select(
    where(is.Date),
    where(is.character),
    where(is.factor),
    where(is.numeric)
    )

a %>% head %>% DT::datatable()

4.1 exploring important features

distribution of outcome var, sale price

a %>% plot_ly(x = ~sale_price) %>% add_boxplot()
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
a %>% plot_ly(x = ~sale_price) %>% add_histogram()

1. right skewed due to many high outliers; we should log transform 2. Note: log transforming decrease the interpretability of the data

ggplotly(a %>% ggplot(aes(sale_price)) + geom_histogram(bins = 50) + scale_x_log10())
a$sale_price = log10(a$sale_price)

transforming the output var will probably result in better models than using the untransformed data.

The downside to transforming the outcome is mostly related to interpretation.

Chapter 5: Spending(Splitting) our data

5.1 common methods for splitting data

set.seed(123)

(split = a %>% initial_split(prob = 0.8, strata = sale_price))
## <Analysis/Assess/Total>
## <2199/731/2930>
train = training(split)
test = testing(split)

Chapter 6: Feature engineering with recipes

Chapter 7: Fitting models with parsnip

Chapter 8: A model workflow

Chapter 9: Judging model effectiveness

TOOLS: FOR CREATING EFFECTIVE MODELS

Chapter 10: Resampling for evaluating performance

Chapter 11: Comparing models with resampling

Chapter 12: Model tuning and the dangers of overfitting

Chapter 15: Explaining models and predictions